Free Solved[December 2022] BCS40 - Statistical Techniques Question Paper

Hey there! Welcome to KnowledgeKnot! Don't forget to share this with your friends and revisit often. Your support motivates us to create more content in the future. Thanks for being awesome!

1. In a study on the per capita income for a particular year in a city, the following weekly observations were made: (5 marks)
Per Capita Income (₹) (1 k = 1000)Number of Weeks
14 k–15 k5
15 k–16 k10
16 k–17 k20
17 k–18 k9
18 k–19 k6
19 k–20 k2

Draw a histogram and frequency polygon on the same scale.

Answer:

To draw the histogram, plot the income intervals on the x-axis and the corresponding number of weeks on the y-axis. For the frequency polygon, plot points at the midpoint of each interval and connect them with lines.

2. A problem of statistics techniques is given to three students A, B and C whose chances of solving it are 12\frac{1}{2}, 13\frac{1}{3} and 14\frac{1}{4} respectively. (5 marks)

(i) What is the probability that the problem will be solved?

(ii) What is the probability that only one of them will solve the problem correctly?

Answer:

(i) The probability that the problem will be solved is given by:

P(Solved)=1P(Not Solved)P(\text{Solved}) = 1 - P(\text{Not Solved})

The probability that each student does not solve the problem is:

P(A)=12,P(B)=23,P(C)=34P(A') = \frac{1}{2}, \, P(B') = \frac{2}{3}, \, P(C') = \frac{3}{4}

Therefore,

P(Not Solved)=P(A)P(B)P(C)=122334=14P(\text{Not Solved}) = P(A') \cdot P(B') \cdot P(C') = \frac{1}{2} \cdot \frac{2}{3} \cdot \frac{3}{4} = \frac{1}{4}

So,

P(Solved)=114=34P(\text{Solved}) = 1 - \frac{1}{4} = \frac{3}{4}

(ii) The probability that only one of them solves the problem is:

P(Only one solves)=P(ABC)+P(ABC)+P(ABC)P(\text{Only one solves}) = P(A \cdot B' \cdot C') + P(A' \cdot B \cdot C') + P(A' \cdot B' \cdot C)

Calculating each:

P(ABC)=122334=14P(A \cdot B' \cdot C') = \frac{1}{2} \cdot \frac{2}{3} \cdot \frac{3}{4} = \frac{1}{4}

P(ABC)=121334=18P(A' \cdot B \cdot C') = \frac{1}{2} \cdot \frac{1}{3} \cdot \frac{3}{4} = \frac{1}{8}

P(ABC)=122314=112P(A' \cdot B' \cdot C) = \frac{1}{2} \cdot \frac{2}{3} \cdot \frac{1}{4} = \frac{1}{12}

Adding them up:

P(Only one solves)=14+18+112=38P(\text{Only one solves}) = \frac{1}{4} + \frac{1}{8} + \frac{1}{12} = \frac{3}{8}

3. Determine mean and median for the following data: (5 marks)

MarksNo. of Students
0—1010
10—209
20—3025
30—4030
40—5016
50—6010

Answer:

To determine the mean:

The midpoint (xi) for each interval is:

0–10: 5, 10–20: 15, 20–30: 25, 30–40: 35, 40–50: 45, 50–60: 55

Multiply the midpoint by the number of students (fi) for each interval and sum:

N=10+9+25+30+16+10=100N = 10 + 9 + 25 + 30 + 16 + 10 = 100

xifi=510+159+2525+3530+4516+5510=3500\sum x_i f_i = 5 \cdot 10 + 15 \cdot 9 + 25 \cdot 25 + 35 \cdot 30 + 45 \cdot 16 + 55 \cdot 10 = 3500

Mean xˉ\bar{x}

xˉ=xifiN=3500100=35\bar{x} = \frac{\sum x_i f_i}{N} = \frac{3500}{100} = 35

To find the median:

The cumulative frequency (C.F) for each interval is:

10, 19, 44, 74, 90, 100

The median is in the interval 30–40, since it contains the 50th observation.

Use the formula:

L+(N2C.Ff)hL + \left( \frac{\frac{N}{2} - \text{C.F}}{f} \right) \cdot h

where L is the lower boundary of the median class, C.F is the cumulative frequency before the median class, f is the frequency of the median class, and h is the class interval.

L=30,C.F=44,f=30,h=10L = 30, \, C.F = 44, \, f = 30, \, h = 10

Median=30+(504430)10=32\text{Median} = 30 + \left( \frac{50 - 44}{30} \right) \cdot 10 = 32

4. Box A contains 5 red and 4 blue balls, Box B contains 2 red and 5 blue balls. A ball is drawn at random from each box. Find the probability that one is red and the other is blue. (5 marks)

Answer:

We need to find the probability that one ball is red and the other is blue.

The possible combinations are:

1. Red from Box A and Blue from Box B

2. Blue from Box A and Red from Box B

The probability for each combination is:

P(Red from A)=59P(\text{Red from A}) = \frac{5}{9} and P(Blue from B)=57P(\text{Blue from B}) = \frac{5}{7}

So, the probability of the first combination:

P(Red from A and Blue from B)=5957=2563P(\text{Red from A and Blue from B}) = \frac{5}{9} \cdot \frac{5}{7} = \frac{25}{63}

The probability for the second combination:

P(Blue from A)=49P(\text{Blue from A}) = \frac{4}{9} and P(Red from B)=27P(\text{Red from B}) = \frac{2}{7}

So, the probability of the second combination:

P(Blue from A and Red from B)=4927=863P(\text{Blue from A and Red from B}) = \frac{4}{9} \cdot \frac{2}{7} = \frac{8}{63}

Adding both probabilities gives us:

P(One Red and One Blue)=2563+863=3363=1121P(\text{One Red and One Blue}) = \frac{25}{63} + \frac{8}{63} = \frac{33}{63} = \frac{11}{21}

5. A statistics professor has given four tests. A student scored 75, 65, 80, and 95 respectively in the four tests. The professor decides to determine his grade by randomly selecting a sample of 2 test scores. Construct the sampling distribution for this process. (5 marks)

Answer:

The possible samples of 2 test scores are:

(75, 65), (75, 80), (75, 95), (65, 80), (65, 95), (80, 95)

The mean of each sample is calculated as follows:

  • (75+65)/2=70(75 + 65) / 2 = 70
  • (75+80)/2=77.5(75 + 80) / 2 = 77.5
  • (75+95)/2=85(75 + 95) / 2 = 85
  • (65+80)/2=72.5(65 + 80) / 2 = 72.5
  • (65+95)/2=80(65 + 95) / 2 = 80
  • (80+95)/2=87.5(80 + 95) / 2 = 87.5

The sampling distribution of the mean is:

SampleMean
(75, 65)70
(75, 80)77.5
(75, 95)85
(65, 80)72.5
(65, 95)80
(80, 95)87.5

6. Find and plot the regression line of y on x for the data given below: (10 marks)

Speed (km/hr) (x)Stopping distance (in feet) (y)
30160
40240
50330
60435

Answer:

We will use the formula for the regression line y=a+bxy = a + bx where:

  • b=nxyxynx2(x)2b = \frac{n\sum xy - \sum x \sum y}{n\sum x^2 - (\sum x)^2}
  • a=ybxna = \frac{\sum y - b\sum x}{n}

Here, we calculate the necessary sums:

  • x=30+40+50+60=180\sum x = 30 + 40 + 50 + 60 = 180
  • y=160+240+330+435=1165\sum y = 160 + 240 + 330 + 435 = 1165
  • xy=(30160)+(40240)+(50330)+(60435)=95400\sum xy = (30 \cdot 160) + (40 \cdot 240) + (50 \cdot 330) + (60 \cdot 435) = 95400
  • x2=302+402+502+602=7700\sum x^2 = 30^2 + 40^2 + 50^2 + 60^2 = 7700

With n=4n = 4 we get:

b=4954001801165477001802=3816002097003080032400=17190030800=5.584b = \frac{4 \cdot 95400 - 180 \cdot 1165}{4 \cdot 7700 - 180^2} = \frac{381600 - 209700}{30800 - 32400} = \frac{171900}{30800} = 5.584

a=11655.5841804=11651005.124=159.884=39.97a = \frac{1165 - 5.584 \cdot 180}{4} = \frac{1165 - 1005.12}{4} = \frac{159.88}{4} = 39.97

The regression equation is:

y=39.97+5.584xy = 39.97 + 5.584x

To plot the regression line, calculate yy for values of xx:

  • x=30:y=39.97+5.58430=207.49x = 30: \, y = 39.97 + 5.584 \cdot 30 = 207.49
  • x=40:y=39.97+5.58440=263.33x = 40: \, y = 39.97 + 5.584 \cdot 40 = 263.33
  • x=50:y=39.97+5.58450=319.17x = 50: \, y = 39.97 + 5.584 \cdot 50 = 319.17
  • x=60:y=39.97+5.58460=375.01x = 60: \, y = 39.97 + 5.584 \cdot 60 = 375.01

These points can be plotted to visualize the regression line.

7. In a partially destroyed laboratory, the legible record of analysis of correlation of data, is as follows: (10 marks)

Variance of x = 9, Regression equations:

(i) 8x – 10y + 66 = 0

(ii) 4x – 18y – 214 = 0

What were (a) the means of x and y, (b) the coefficient of correlation between x and y and (c) the standard deviation of y?

Answer:

(a) The means of x and y:

The regression equations can be written as:

  • 8x10y=668x - 10y = -66
  • 4x18y=2144x - 18y = 214

To find the means, we need to solve for xx and yy. Setting up simultaneous equations:

  • From 8x10y=668x - 10y = -66, we get x=10y668=5y334x = \frac{10y - 66}{8} = \frac{5y - 33}{4}
  • Substituting in 4x18y=2144x - 18y = 214
  • 4(5y334)18y=2144 \left(\frac{5y - 33}{4}\right) - 18y = 214
  • 5y3318y=2145y - 33 - 18y = 214
  • 13y=247-13y = 247
  • y=19y = -19
  • Substitute y=19y = -19 back in to find xx
  • x=5(19)334=95334=1284=32x = \frac{5(-19) - 33}{4} = \frac{-95 - 33}{4} = \frac{-128}{4} = -32

Thus, the means are xˉ=32\bar{x} = -32 and yˉ=19\bar{y} = -19.

(b) The coefficient of correlation between x and y:

The coefficient of correlation rr can be found using the slopes of the regression lines:

  • byx=810=0.8b_{yx} = \frac{-8}{10} = -0.8
  • bxy=418=29b_{xy} = \frac{4}{-18} = -\frac{2}{9}

The coefficient of correlation is r=byxbxy=0.829=1.69=1690=49r = \sqrt{b_{yx} \cdot b_{xy}} = \sqrt{0.8 \cdot \frac{2}{9}} = \sqrt{\frac{1.6}{9}} = \sqrt{\frac{16}{90}} = \frac{4}{9}

(c) The standard deviation of y:

We know the variance of x is 9, hence standard deviation of x is:

σx=9=3\sigma_x = \sqrt{9} = 3

Using r=0.4r = 0.4 and standard deviation formula for y:

σy=σxbyx=30.8=3.75\sigma_y = \frac{\sigma_x}{b_{yx}} = \frac{3}{0.8} = 3.75

8. (a) Compare simple random sampling with replacement and simple random sampling without replacement. (2 marks)

Answer:

Simple random sampling with replacement: Each member of the population can be chosen more than once. The sample is drawn randomly and replaced before the next draw. This method allows for the same element to be included multiple times in the sample, leading to independent sampling. The sample size can be the same as or larger than the population size.

Simple random sampling without replacement: Each member of the population can only be chosen once. Once an element is selected, it is not returned to the population pool. This method ensures that each element appears only once in the sample, leading to dependent sampling. The sample size cannot exceed the population size.

(b) Define time series and discuss various components of time series. (4 marks)

Answer:

Time series: A time series is a sequence of data points collected or recorded at specific time intervals, often in chronological order. It is used to track changes over time and identify patterns, trends, and cycles.

Components of time series:

  • Trend: The long-term movement or direction in the data. It shows the general pattern or tendency of the data over a long period, whether it is increasing, decreasing, or remaining constant.
  • Seasonal variation: Regular and predictable changes in the data that occur within a specific period, such as daily, monthly, or annually. These variations are usually caused by seasonal factors like weather, holidays, or events.
  • Cyclical variation: Fluctuations in the data occurring over longer periods due to economic or business cycles. These variations are not fixed and can span several years.
  • Irregular variation: Random or unpredictable changes in the data caused by unforeseen events, such as natural disasters, strikes, or political events. These variations do not follow a pattern and are usually temporary.

(c) Write short notes on the following: (2+2 marks)

Answer:

(i) t-test: A t-test is a statistical test used to determine if there is a significant difference between the means of two groups. It helps to assess whether the differences observed in the sample are statistically significant and not due to random chance. The t-test is commonly used in hypothesis testing to compare the means of two samples or a sample mean with a known value. There are different types of t-tests, including independent t-test, paired t-test, and one-sample t-test.

(ii) Properties of a good estimator: A good estimator should have the following properties:

  • Unbiasedness: The estimator should not systematically overestimate or underestimate the true parameter value. The expected value of the estimator should be equal to the parameter being estimated.
  • Consistency: The estimator should converge to the true parameter value as the sample size increases. Larger samples should provide more accurate estimates.
  • Efficiency: Among all unbiased estimators, the estimator should have the smallest variance, providing more precise estimates.
  • Sufficiency: The estimator should utilize all the information available in the data relevant to estimating the parameter.

9. The table given below shows the relation between the performance of students in Statistics and Computer Sciences. Test the hypothesis that the performance in Statistics is independent of the performance in Computer Sciences using 5% level of significance. (Given that χ0.05,42=9.49\chi^2_{0.05, 4} = 9.49). (10 marks)

Computer ScienceHigh GradeMedium GradeLow Grade
Statistics
High Grade367242
Medium Grade3412244
Low Grade505644

Answer:

Observed Frequencies:

HighMediumLowTotal
High367242150
Medium3412244200
Low505644150
Total120250130500

Expected Frequencies: Eij=(Row total)(Column total)Grand totalE_{ij} = \frac{(\text{Row total})(\text{Column total})}{\text{Grand total}}.

HighMediumLow
High3672/500=3036 \cdot 72 / 500 = 3072150/500=37.572 \cdot 150 / 500 = 37.542150/500=31.542 \cdot 150 / 500 = 31.5
Medium34200/500=3434 \cdot 200 / 500 = 34122200/500=49122 \cdot 200 / 500 = 4944200/500=4444 \cdot 200 / 500 = 44
Low5056/500=5650 \cdot 56 / 500 = 5656150/500=37.556 \cdot 150 / 500 = 37.544150/500=4444 \cdot 150 / 500 = 44

Calculating χ2\chi^2:

χ2=(OijEij)2Eij\chi^2 = \sum \frac{(O_{ij} - E_{ij})^2}{E_{ij}}

  • For High, High: (3630)230=1.2\frac{(36 - 30)^2}{30} = 1.2
  • For High, Medium: (7237.5)237.5=30.8\frac{(72 - 37.5)^2}{37.5} = 30.8
  • For High, Low: (4231.5)231.5=3.5\frac{(42 - 31.5)^2}{31.5} = 3.5
  • For Medium, High: (3434)234=0\frac{(34 - 34)^2}{34} = 0
  • For Medium, Medium: (12249)249=75.5\frac{(122 - 49)^2}{49} = 75.5
  • For Medium, Low: (4444)244=0\frac{(44 - 44)^2}{44} = 0
  • For Low, High: (5056)256=0.6\frac{(50 - 56)^2}{56} = 0.6
  • For Low, Medium: (5637.5)237.5=9.5\frac{(56 - 37.5)^2}{37.5} = 9.5
  • For Low, Low: (4444)244=0\frac{(44 - 44)^2}{44} = 0

χ2=1.2+30.8+3.5+0+75.5+0+0.6+9.5+0=121.1\chi^2 = 1.2 + 30.8 + 3.5 + 0 + 75.5 + 0 + 0.6 + 9.5 + 0 = 121.1

Conclusion:

The calculated χ2\chi^2 value of 121.1 is greater than the critical value of 9.49. Therefore, we reject the null hypothesis and conclude that the performance in Statistics is not independent of the performance in Computer Sciences.